This dataset consists of two kinds of information: 
1) The file real_metrics.txt, a table where rows correspond to syntactic dependency trees of real sentences and the columns, that are separated by semicolons, are  
- n, the number of nodes of the tree (the sentence length).
- D, the sum of dependency lengths (in words).
- collection, the ensemble of dependency treebanks (out of UD for Universal Dependencies, Prague and Stanford).

2) A multivolume zipfile (GeneratedTreesAndGrammarClasses.zip) containing files of the form length_n.trees and length_n.metrics, with n an integer ranging from 1 to 25 that indicates the number of nodes. 

The length_n.trees file contains a list dependency trees with n nodes (one per line). In particular, each line contains two fields separated by a semicolon. The first field represents a tree as a space-separated string of head indexes (the Xth token corresponding to the index of the head of the Xth word, where 0 means that the Xth token is the root node). The second field is a comma-separated list of the mildly non-projective classes of trees that the tree belongs to (out of 1ec, mh4, mh5, pla, prj, wg1). For each n from 1 to 10, the length_n.trees file contains a list of all the possible dependency trees with n nodes. For n > 10 it contains a random sample of the possible trees with n nodes. 

The length_n.metrics file contains metrics about each of the possible dependency trees with n nodes. The file refers to one tree per line, and it is parallel with respect to length_n.trees (i.e., the Xth line of the length_n.metrics file refers to the same tree as the Xth line of the length_n.tree file). For each line, the file contains three columns separated with semicolons: 
- the number of nodes of the tree (i.e. the sentence length, n)
- the sum of dependency lengths in the tree (D) 
- a comma-separated list of the mildly non-projective classes of trees that the tree belongs to (out of 1ec, mh4, mh5, pla, prj, wg1).

The codes for each class are
- 1ec for 1-Endpoint-Crossing
- mh4 for Multi-Headed with at most $4$ heads per item.
- mh5 for Multi-Headed with at most $5$ heads per item.
- pla for planar
- prj for projective
- wg1 for Well-nested trees with gap degree 1.

